Ground Truth Dataset: Objectionable Web Content

نویسندگان

چکیده

Cyber parental control aims to filter objectionable web content and prevent children from being exposed harmful content. Succeeding in detecting blocking depends heavily on the accuracy of topic model. A reliable ground truth dataset is essential for building effective cyber models validation new detection methods. The measurement labeling unobjectionable websites dataset. lack publicly accessible datasets with a has prevented fair coherent comparison different methods proposed field control. This paper presents that contains 8000 labelled 4000 websites. These consist more than 2 million pages. Creating involved few phases, including data collection, extraction, labeling. Finally, presence bias, using kappa coefficient measurement, addressed. available Mendeley repository.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collecting a Ground Truth Dataset for OpenStreetMap

The quality of OpenStreetMap (OSM) and volunteered geographic information (VGI) in general has already been discussed extensively in the literature. Researchers have looked at this issue from different angles such as credibility [2], trust [1], provenance [12, 9], precision [4], and communities [5]. Comparative studies often use commercial datasets or datasets from a national mapping agencies f...

متن کامل

Specialized Web Robot for Objectionable Web Content Classification

This paper proposes a specialized Web robot to automatically collect objectionable Web contents for use in an objectionable Web content classification system, which creates the URL database of objectionable Web contents. It aims at shortening the update period of the DB, increasing the number of URLs in the DB, and enhancing the accuracy of the information in the DB. Keywords—Web robot, objecti...

متن کامل

Filtering objectionable internet content

As the Internet has evolved, it has become an information, entertainment, retail, and communication source that millions of people use as a matter of routine. Given the diversity of views and the ability to post any kind of information on the Internet, very often, material that is considered objectionable can be easily accessed on the Web. This is particularly problematic when children are able...

متن کامل

A Synchronization Ground Truth for the Jiku Mobile Video Dataset

This paper introduces and describes a manually generated synchronization ground truth, accurate to the level of the audio sample, for the Jiku Mobile Video Dataset, a dataset containing hundreds of videos recorded by mobile users at different events with drama, dancing and singing performances. It aims at encouraging researchers to evaluate the performance of their audio, video, or multimodal s...

متن کامل

Realistic CG Stereo Image Dataset with Ground Truth Disparity Maps

Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, research in some branches of the field has been constrained due to the few number of stereo datasets with ground truth disparity maps available. Having available a large dataset of stereo images with ground truth disparity maps would boos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Data

سال: 2022

ISSN: ['2306-5729']

DOI: https://doi.org/10.3390/data7110153